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Abstract. Here we introduce researchers in algebraic biology to the 
exciting new field of cophylogenetics. Cophylogenetics is the study of 
concomitantly evolving organisms (or genes), such as host and parasite 
species. Thus the natural objects of study in cophylogenetics are tuples 
of related trees, instead of individual trees. We review various research 
topics in algebraic statistics for phylogenetics, and propose analogs for co- 
phylogenetics. In particular we propose spaces of cophylogenetic trees, co- 
phylogenetic reconstruction, and cophylogenetic invariants. We conclude 
with open problems. 



1 Introduction 

Phylogenetics has provided an abundant source of apphcations for algebraic 
statistics, with research areas including phylogenetic invariants, the geometry 
of tree space, and analysis of phylogenetic reconstruction. Traditionally these 
applications — like phylogenetics at large — focused on the common ancestries 
among a single set of species (or set of gene homologs). 

On the biological front, however, phylogenetic research has since expanded to 
include other types of evolutionary relationships besides common ancestry. Here 
we present one of the largest new research topics in phylogenetics, called cophy- 
logenetics, and we explore possible applications of algebraic statistics. Just as 
phylogenetics can be loosely described as the study of evolution, cophylogenetics 
is essentially the study of coevolution. Coevolution is the concomitant evolution 
and speciation of one species (or gene) with another. In biology there are two 
actively studied examples of coevolution: 

— Host-parasite coevolution (or more generally, symbiont coevolution): Sym- 
biont species interact with one another and often migrate together, and thus 
tend to have parallel lineages during evolution and speciation. Thus sym- 
bionts often have similar phylogenetic trees. 

— Gene trees and species trees: Loosely speaking, genes can be thought of as 
"symbionts" living within a species. Thus gene trees are often similar to one 
another and the species tree. 
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Fig. 1. Phylogenetic trees for gopher and louse data sets [15] constructed via 
BEAST. Hosts and their parasites are indicated by connecting dashed hues. 



There have been many studies of coevolution of clades of host species and 
their corresponding clades of parasite species ([5S] and its references). One well- 
known example is the set of gopher/louse pairs reported in [15]. Even though 
there is significant evidence of parallel evolution between gophers and their lice, 
their reconstructed tree topologies differ (Fig.[T]). In fact, reconstructed host and 
parasite trees are rarely identical. This disagreement could be due to reconstruc- 
tion errors, either caused by noise in the input data or heuristical reconstruction 
methods, or host and parasite trees could be truly different. 

As host and parasite coevolve, there are six commonly recognized types of 
events which can occur along lineages [25]. These are shown in Fig. [21 

(a) A host and a parasite cospeciate, i.e., they speciate together. 

(b) A parasite changes its host {host switch), which is equivalent to a gene 
transfer in gene trees. 

(c) A parasite speciates independently of their host. 

(d) A parasite goes extinct. 

(e) A parasite fails to colonize all descendants of a speciating host lineage. 

(f) A parasite fails to speciate. 

Notice that in the figure, events (b) through (f ) can cause host and parasite tree 
topologies to differ. 

Analagous to host /parasite relationships, gene trees for a given set of species 
can vary from gene to gene and also differ from the species tree. In statistics, the 
relation between gene and species trees is well-understood in terms of coalescent 
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Fig. 2. Evolutionary events which can occur during host-parasite coevolution. 



processes [16]. However coalescent models usually assume that genes cannot be 
transfered between members of different species. In reality, microbial organisms, 
for example, can exchange genetic material in a process called lateral gene trans- 
fer [9], which is analogous to host switching. Just as host switching can cause 
parasite trees to disagree with host trees, lateral gene transfer can cause gene 
trees to disagree with species trees. Combinatorially, these mechanisms corre- 
spond to subtree prune and regraft (SPR) operations |29| . which are discussed 
further in Section [2l 

Many techniques have been developed to compare gene trees [1911013114138123137118] , 
and host and parasite trees [6|30|17I15] . 

In existing methods for comparing host and parasite trees, first point esti- 
mates of trees are made and then trees are compared to one another. However 
estimating each tree separately can exagerate the true differences between the 
trees. Thus one of our main tenets in this paper is that when reconstructing 
and studying related trees, researchers in mathematical phylogenetics should 
explicitly consider tuples of trees, which we call cophylogenies: 

Definition 1. Let Th and Tp be spaces of trees for two sets of taxa H and P. 
A cophylogeny is any pair of trees {Th, Tp) 6 Th x Tp. 

More generally, for tree spaces Td , . . • , Tqi, , a cophylogeny is any tuple 
(Tgi , . . . , Tcg) G X ■ • • X Tqi, . However in this paper we will focus mainly on 
the case of two sets of taxa, H and P. 

There has been much study on underlying combinatoric, algebraic, and poly- 
hedral geometric structures for phylogenetic trees defined on a fixed set of species 



(see [23] and its references) . However for cophyfogenies — particularly cophylo- 
geneis of trees which are presumed to be related — the combinatoric, algebraic, 
and polyhedral geometric structures have received little attention. Thus we pro- 
pose extending mathematical phylogenetics to include cophylogcnetics. To this 
end we introduce spaces of cophylogenetic trees, cophylogenetic reconstruction 
problems, cophylogenetic invariants, and new geometries of tree space. 

2 Spaces of cophylogenetic trees 

In this paper, we assume all trees are unrooted unless specified. Nevertheless, 
almost all results will also be directly applicable to rooted trees e.g. by attaching 
a designated "root leaf" to convert rooted trees into unrooted trees. We also 
assume that all trees have n leaves, which will usually be labeled 1, 2, . . . , n. 

Definition 2. A dissimilarity map on {1, 2, ... ,n} is an nxn symmetric matrix 
D — with zeroes on the diagonal and all other entries positive. 

Equivalently the set of dissimilarity maps is R^^''. 

Definition 3. Let D he a dissimilarity map. D is a tree metric if there exists 
an (edge-weighted) tree T with leaves {1, 2, • • • ,n\, such that 

— All edge weights in T are positive, 

— For every pair of i, j, dij = the sum of the edge weights along the path from 
i to j. 

Equivalently tree metrics can be defined by the Four Point Condition. 

Theorem 1 (Four Point Condition [5j). Let D be a dissimilarity map. Then 
D is a tree metric if and only if for all possible distinct leaves i,j, k, I, the max- 
imum of {dij + dki, dik + dji, dii + djk} is achieved at least twice. 

By Theorem [U the set of tree metrics on {1, 2, . . . , n} can be realized as a 

union of cones in M.^^ 24J. 

Definition 4. A subset S C Th x Tp is called a space of cophylogenetic trees. 
If Th and Tp are spaces of tree topologies (instead of tree metrics), then S C 
Th X Tp is called a space of cophylogenetic tree topologies. 

Our definition is deliberately vague, as there are many spaces of cophylogenetic 
trees which are biologically and mathematically interesting. 

Example 1. Even if trees Th and Tp have the same topology, variable rates 
of evolution can cause edge lengths to differ between the trees. Thus we can 
consider the "topology diagonal" of Th x Tp : 

S = {{Dh, Dp) : Dh and Dp are tree metrics, with the same underlying 
bifurcating tree topology }. 



Analagous to the traditional space of trees, the topology diagonal is a union 
of (2n — 3)! polyhedral cones, and can be defined by an extended Four Point 
Condition: 

Proposition 1. IfDH,Dp £ are dissimilarity maps, then {Dh,Dp) € S 

if and only if the Four Point Condition holds for the three dissimilarity maps 
Dh,Dp, and Dh + Dp (where each maximum in the Four Point Condition is 
attained exactly twice). Thus the topological closure of S is (the negation of) a 
tropical variety. 

Proof. It suffices to prove that if Du,Dp are tree metrics with respective bi- 
furcating tree topologies T// , Tp, then Dh + Dp is a tree metric if and only if 
Th = Tp. So suppose Dh — {dij}, Dp ~ {cij} are tree metrics with underlying 
bifurcating tree topologies Th, Tp. If Th = Tp, then Dn + Dp is also a tree met- 
ric with tree topology Th, because tree metrics with topology Th form a cone. 
Conversely, if Th 7^ Tp, then there will be some choice of four taxa i, j, k, I, such 
that the quartet induced by i,j,k,l in Th is different than the quartet induced 
by k, I induced in Tp. By the Four Point Condition, in the matrix 

dij + dki dik + dji dii + djk 

C-ij -\- Cj^i Cj/j; -t- Cji Cil -\- Cjf^ 

the maximum of each row will be attained twice — in fact, exactly twice since 
Th,Tp are bifurcating. Thus each row attains a strict mininmm. Furthermore, 
since i,j,k,l induce different quartets in Th and Tp, the row minimums must 
be in different columns. Thus, without loss of generality we can write the above 
matrix as 

X X y 

w z z 

where y < x and w < z. Summing the rows, we get the three numbers {x + 
w, X + z,y + z), which attains a unique maximum x + z. Thus Dh + Dp cannot 
satisfy the Four Point Condition so Dh + Dp is not a tree metric. □ 

Tropical varieties were first introduced to mathematical phylogenetics in [33], 
where it was shown that the space of trees is a tropical variety. We think it is 
an interesting problem to find other important spaces of cophylogenetic trees 
which are tropical varieties, or can be expressed by conditions involving linear 
equations and inequalities. 

Example 2 (Host switching/lateral gene transfer). As we noted earlier, host- 
switching and lateral gene transfers correspond to subtree-prune-and-regraft 
(SPR) ^9] operations on trees. In an SPR operation, a subtree is detached, 
or pruned, from the tree by cutting an edge, and reattached to the middle of a 
different edge. The SPR distance between two trees is the minimum number of 
SPR operations needed to transform one tree into the other. As SPR operations 
are fundamental in mathematical cophylogenetics, we define the k-SPR space of 
cophylogenetic trees as the set of all cophylogenies {Th,Tp) where Th,Tp have 



SPR distance no more than k.li k — 1, there are 2(n — 3)(27i — 7) tree topologies 
for Th, 1 SPR operation from Tp, as described in Theorem 2.6.2]. 

However, for unrooted trees, the SPR distance only provides a lower bound on 
the number of host-switches or lateral gene transfers that have occurred, because 
in this case, the SPR operations must be consistent with the trees' orientations 
with respect to time. To find the minimum number of host-switches or lateral 
gene transfers, we must consider the SPR distance between rooted trees, whose 
vertices are totally ordered with respect to time [31]. See [31] for a study of the 
analagous 1-SPR space of cophylogenetic trees in this setting. □ 

Example 3 ( Coalescent cophylogeny). In a coalescent cophylogeny, Th is a (rooted) 
species tree, and Tp is a (rooted) gene tree generated from Th according to the 
coalescent process [16 . A coalescent history is a list of the branches of the species 
tree on which coalescences in the gene tree occur. We define the k-coalescent his- 
tory space to be the set of all cophylogenies (Th^Tp) for which the coalescent 
history for Th and Tp occurs with probability > k out of all valid coalescent 
histories for Th- Degnan and Salter [7] showed how to compute this probabil- 
ity. Note that the more dissimilar the topology of Th and Tp, the smaller the 
number of valid coalescent histories, and the lower their probabilities. However, 
for any given species tree Th, there is at least one valid coalescent history for 
each gene tree topology, namely, the history in which all coalescent events occur 
before any speciation in Th- □ 

Example 4- A Nearest Neighbor Interchange (NNI) operation swaps two adja- 
cent subtrees, or subtrees joined by one edge, in an unrooted tree, as illustrated 
in Fig. [3] The NNI distance between two trees is the minimum number NNI 
operations needed to transform one tree into the other [26| . 

Analagous to the fc-SPR space of cophylogenetic trees, we can define a space 
of cophylogenetic trees as the set of all cophylogenies [Th, Tp) where Th, Tp have 
NNI distance no more than fc. In general, we can define a space of cophylogenetic 
trees as the set of all cophylogenies {Th,Tp) where d{TH,Tp) < k for some 
distance or disimilarity measure on tree metrics (or tree topologies). 



Other choices for d{TH,Tp) might include the geodesic distance between Tp 
and Th in tree space [4]; the quartet distance [12]; and the Robinson- Foulds 




Fig. 3. An NNI operation. 



symmetric difference [27] . 



□ 



Example 5 (k-interval cospeciation) . In host-parasite coevolution, a speciation 
in the host is hkely to be followed by a reactionary speciation in the parasite, 
and vice versa. If a reactionary speciation is delayed long enough, then host and 
parasite tree topologies can disagree. In fact, if multiple consecutive speciations 
occur in host before a reactionary speciation occurs in parasite (or vice versa), 
then the tree topologies can be quite different. 

Biologically, it is highly unlikely that a large number of consecutive speci- 
ations can accumulate in a host lineage, without any reactionary speciation in 
parasite. Thus, when reconstructing host and parasite trees, we might assume 
that only a bounded number of consecutive speciations can occur in any host 
lineage before a reactionary speciation in parasite (and vice versa). Combina- 
torially this implies that for each pair of host species A, B, and corresponding 
parasite species a, 6, the number of edges between A, B is within k of the number 
of edges between a, h. We say such a cophylogeny satisfies k-interval cospeciation, 
and the set of all such cophylogenies is called the k-interval space of cophyloge- 
netic trees. □ 

If we treat Th and Tp as having edge weights of 1, then asking whether 
Th and Tp satisfy fc-interval cospeciation is equivalent to asking whether the 
Loo-norm between their dissimilarity maps is < k. Although other norms on 
dissimilarity maps have been studied ([39], [34]), the Loo-norm appears to have 
only been used in the context of a map between two versions of tree space [21] . 

We believe that in cophylogenetics applications, distances based on fc-interval 
cospeciation will be more useful than the NNI distance. There has been some 
work characterizing trees within a prescribed NNI distance [26] , but to our knowl- 
edge the combinatorial properties of fc-interval cospeciation have not been stud- 
ied. We note that fc-interval cospeciation and NNI distance are related concepts, 
and for fc = 1 we have 

Theorem 2. Suppose Th,Tp are unrooted trees on the same set of leaves. Then 
Th and Tp satisfy 1-interval cospeciation if and only if Tjj and Tp differ by at 
most 1 NNI operation. 

Proof: See Section [HI 

Since there are 2n — 6 bifurcating trees which are exactly one NNI move 
away from a given bifurcating tree with n > 4 leaves |26j . the 1-interval space 
of cophylogenetic trees on n taxa contains (2n — 5) • (2n — 5)!! ordered pairs of 
bifurcating tree topologies when n > 4. 

3 Cophylogenetic reconstruction 

In the popular viewpoint echoed in [21 , distance-based methods for reconstruct- 
ing phylogenetic trees from dissimilarity maps can be regarded as retractions 

from R^^'^ to tree metrics. Due to their rich mathematical structure, two meth- 
ods for phylogenetic reconstruction have received a great deal of attention in the 
mathematical biology community: Neighbor joining (NJ) |28|22] and balanced 



minimum evolution (BME) [S] . Intriguingly, Gascuel and Steel [T3] have recently 
shown that NJ is a greedy heuristic for building BME trees; see also [TT]. 

In this paper we propose distance-based cophylogenetic reconstruction, which 
means infering a cophylogeny {Th, Tp) given input tuples of dissimilarity maps 
{Dh^ Dp). Common methods for reconstructing Tu and Tp merely perform stan- 
dard phylogenetic reconstruction, infering Tu from DH^ and Tp from Dp. Ide- 
ally, methods for cophylogenetic reconstruction should incorporate constraints 
or mixed objective functions that account for similarity between Th and Tp. We 
believe cophylogenetic reconstruction is an important avenue of future research 
in mathematical phylogenetics. 

Specifically we propose studying constrained cophylogenetic reconstruction, 
and minimum coevolution methods. Due to the widespread mathematical interest 
in NJ and BME, we will focus on cophylogenetic reconstruction based on these 
methods. See [TT] for a description and references on NJ and BME; here we will 
only briefly introduce basic facts and notation for the BME method. Given a 
dissimilarity map D, where D is assumed to be a noisy observation of a tree 
metric Dt, BME chooses a tree topology T whose sum of estimated branch 
lengths is minimal. The sum of estimated branch lengths is also called the length 
of T, written £{T). The formulation of BME has a rich and elegant structure: 
for each topology T we have £{T) = b{T) ■ D, where b{T) is the BME vector for 
T (which does not depend on D). Thus, BME is equivalent to minimizing the 
linear functional D over the polytope B = conv{6(r)}T , which is called the 
BME polytope [TT] . 



3.1 Retraction onto spaces of cophylogenetic trees 

Fix a space of cophylogenetic trees S C Th x Tp C M.^^ x A constrained 

cophylogenetic reconstruction method is a retraction M.^"^ x R^^'^ — > S, i.e. a 
mapping which restricts to the identity map on S. We believe retractions onto 
spaces of cophylogenetic trees are an important new field of study in mathemat- 
ical biology. 

In particular, we can formulate constrained joint BME: Given dissimilarity 
maps Dh,Dp, find a pair of tree topologies {Th,Tp) S S whose length sum 
i{TH)+£{Tp) is minimal (where tree length e{T) is defined as in BME). Simiarly 
we can define the joint BME polytope, which is the subpolytope B' C B x B C 

M.^^ X M.^\ whose vertices correspond to pairs of tree topologies {Th,Tp) G S. 

Constrained joint BME is a very new research topic. Recently Matsen [20] has 
studied constrained joint BME under the fc-SPR space of rooted cophylogenetic 
trees. In the spirit of [T5] . Matsen devises a neighbor joining algorithm, which, 
given dissimilarity maps Dh,Dp, finds a pair of trees (Th,Tp) within k rooted 
SPR moves of one another; Matsen's algorithm attempts to minimize the sum of 
tree lengths DH-b{TH)+Dp-b{Tp). [5D] is the only work of its kind, and so far has 
not been extended to other spaces of cophylogenetic trees, such as fc-NNI or k- 
interval. We believe this is an important direction for future research. Moreover, 



joint BME polytopes have not yet been studied for any space of cophylogenetic 
trees, and we believe this is an interesting geometric problem in light of 

3.2 Balanced minimum coevolution 

Constrained joint BME can be regarded as a constrained optimization problem: 
Minimize the sum of tree lengths 1{Th) + (-{Tp), subject to (Tp, Tp) e S. Alter- 
natively, given a distance measure d{TH,Tp) between trees or tree topologies, 
we can define the total coevolution £{Th, Tp) as 

e{TH, Tp) e{TH) + e{Tp) + d{TH, Tp), 

where £{Th) and £{Tp) are defined as in standard BME. We call the pair of trees 
(Th,Tp) which minimizes £(Th,Tp) the balanced minimum coevolution (BMC) 
cophylogeny. So far BMC has not been studied for any choice of distance measure 
between trees. Natural choices of d{TH, Tp) for initial study might include SPR 
distance, NNI distance, and fc-interval distance. Also, analagous to [T3j, we can 
ask whether there is a fast heuristic like NJ for finding BMC cophylogenies. 

4 Cophylogenetic invariants 

Phylogenetic invariants are a well-studied subject in algebraic biology ([2] and 
its references), and can be generalized to cophylogenetic invariants. 

First, we would like to remind the reader about phylogenetic invariants. Let 
T be a rooted tree with n leaves and let V{T) be the set of nodes of T. To each 
node V 6 V(T) let be a discrete random variable which takes k distinct states. 
Consider the probability P{Xy — i) that Xy is in state i. Let tt be a distribution 
of the random variable Xr at the root node r. For each node v £ V(T)\{r}, 
let a(v) be the unique parent of v. The transition from a{v) to v is given by 
a fc X fc-matrix A*^'"-' of probabilities. Then the probability distribution at each 
node is computed recursively by the rule 

k 

P{X,=j) ^ ^A(;'•P(X,(„)=^). (1) 

i=l 

This rule induces a joint distribution on all the random variables X^^. We label 
the leaves of T by 1, 2, . . . , n, and we abbreviate the marginal distribution on 
the variables at the leaves as follows: 

Piii2...i„ = P{Xi=ii,X2=i2,---,Xn^in). (2) 

A phylogenetic invariant of the model is a polynomial in the leaf probabil- 
ities Piii2 - i„ which vanishes for every choice of model parameters. The set of 
these polynomials forms a prime ideal in the polynomial ring over the unknowns 
Piii2---in Qf I35j and their references). Furthermore we have the following theorem. 



Theorem 3 ([35j). For any group based model on a phylogenetic tree T, the 
prime ideal of phylogenetic invariants is generated by the invariants of the local 
submodels around each interior node of T , together with the quadratics which 
encode conditional independence statements along the splits of T . 

It is natural to ask whether invariants of cophylogenies can also be charac- 
terized. Fix a group-based model for gene sequence evolution. Let Piii2 - i„ and 
9iii2 - i„ be indeterminates representing leaf probabilities, for host and parasite 
sequences respectively. For each tree topology Tp, let Ixp C R[pi^i^...i^] be the 
ideal of phylogenetic invariants for Tp. Similarly let It„ C R[(liii2---in\ be the 
ideal of phylogenetic invariants for host topology Th- 

Definition 5. Given host and parasite tree topologies Th,Tp, the ideal of co- 
phylogenctic invariants for Tp, Th is the intersection ideal R'Itp H R'Ith C R' , 
where R' = R[p2i22---i„i <liii2---'iJ- 

Definition 6. Fix a space of cophylogenetic tree topologies S C Th x Tp. Given 
a tree topology Th, let Sth — {Tp \ {Th,Tp) S S}. The intersection 

Jth ^ ^TpESth^Tf 

is the ideal of compatability invariants for Th under S . 

Ideals of cophylogenetic invariants are easily computed from Theorem [3] We 
hope that ideals of compatibility invariants can also be computed for particular 
spaces of cophylogenetic trees, without resorting to brute force computation of 
intersection ideals. 

5 Open problems 

In this section we conclude by summarizing open problems. 

Problem 1. Is there a generalization of Theorem [2] to fc > 1? In other words, 
given a host tree Th, is there a combinatorial characterization of parasite trees 
under fc-interval cospeciation? How many parasite trees are possible for each 
host tree? 

Problem 2. Study the geometry and combinatorics of various spaces of cophylo- 
genetic trees, in the spirit of [3]. 

Problem 3. Are there other interesting spaces of cophylogenetic trees which ad- 
mit linear or tropical characterizations, like the extended Four Point Condition 
for the topology diagonal? 

Problem 4- Develop distance-based methods for cophylogenetic reconstruction, 
and study their robustness and geometric properties analagous to [22] . 



Problem 5. Can we compute and/or describe some generators of compatability 
ideals in a particular space of cophylogenetic trees, without resorting to a brute 
force computation of intersection ideals? 

Problem 6. Compute and study the face structures of joint BME polytopes for 
specific spaces of cophylogenetic trees. In particular, can their graph structures 
(vertices and edges) be determined? 

6 Proof of Theorem [2] 

Proof. It is clear that if Tp is at most 1 NNI from Th, then Tp and Th satisfy 
1-interval cospeciation. 

To show that if there is 1-interval cospeciation, then Th and Tp differ by at 
most one NNI, we use induction on the number of leaves. The base case occurs 
when there are 4 leaves. Then, there can be most one NNI, and thus the number 
of edges between leaves changes by at most 1. 

If some cherry (a, 5) in Th that is also a cherry in Tp, then replace these 
cherries with the same leaf and we are done by induction. It remains to consider 
the case when no cherry (a, b) in Th is also a cherry in Tp. 

1-interval cospeciation implies a and b are at most 3 edges apart, and thus 
exactly 3 edges apart, since they do not form a cherry. Then without loss of 
generality, a forms a cherry with some subtree Sc, and b is attached just above 
this cherry, as shown in Fig. [D Futhermore, we can assume, without loss of 
generality, that in a sequence of NNIs transforming Th into Tp , the NNI moving 
Sc to between a and b occurs last. Let Tc be the second last tree in this sequence 
of NNIs. Then by our hypothesis, Th and Tc differ by at least 1 NNI. If Th and 



Th Tc Tp 

/ / 

/ > 1 NNI X 'NNI X 

A ~ /\ ^= ~ A 

° ° a b a Sc 

Fig. 4. The trees used in the proof of Theorem [51 

Tp differ by exactly 1 NNI, then there are two cases. If this NNI is done about 
an edge in Sc, then some leaf I in Sc moves one edge closer to the root of Sc- 
This implies the number of edges between a and / decreases by 2 between Th 
and Tp, which is a contradiction. Otherwise, a, b, and Sc are all contained in 
one of the subtrees involved in the interchange, say A (using the notation from 



Fig. [3]). Then this NNI moves a one edge closer to each leaf in B, and the second 
NNI moves a one edge closer to the root of A, and hence to each leaf in B. Thus, 
the number of edges between a and any edge in B decreases by 2 between Th 
and Tp, which is a contradiction. 

Thus at least two NNIs are needed to transform Th into Tc. Both Th and 
Tc contain the cherry (a, 6), so replace this cherry with the leaf ab in Th to get 
the tree T^, and in Tc to get the tree T^. 

By the induction hypothesis on T'^ and T^, there are distinct leaves i,j such 
that 

\dT^{hj)-dTji,j)\>l. (3) 

Case ^ Sc and i,j ^ ab) or (i,j G Sc) '■ Then dT>{i,j) = dT^{i,j) = 
dTpihj) a.nddT^{i,j) = dT„(j, j)- Plugging into ^ gives |dTp(i, j)-dT„(i,i)| > 
1. 

Case i ^ Sc, and j — ab : Then dT'{i,ab) = dT^{i,a) — 1 = dTp{i,a) — 1, 
and dx^ {i, ab) = dx^ (i, a) — 1. Plugging into ([3]) gives {dxp (i, a) — dx^ (* , a) | > 1- 

Case i e Sc, j ^ Sc, and j ^ ab : Consider the subtree of containing a6, 
iSc, j, and the paths between them. Let x be the number of edges between j 
and the common ancestor of ab and i. Let y be the number of edges from the 
root of subtree Sc to i. Let V be the interior vertex where the paths from ab, i, 
and j meet in T^. Let u be the number of edges between ab and V. Let ?; be 
the number of edges between j and V. Let w be the number of edges between 
i and V. Then dT^{i,j) = 1 + x + y. Since dT^{i,j) = dT^{i,j) and dT^{i,j) = 
dT„{i,j), then |dT«(«, j) - dT,(i,i)| > 1, which implies dT«(i, j) < a: + y - 1 or 
dTnihj) ^ x + y + 3. Now dTp{i,j) = 2 + x + y, so 1-interval cospeciation implies 
dr^ (i, j) = a; + y + 3. By definition of v and w, w + ui = {h j) = x + y + 3. 

We have dTp(a,i) = 2 + y, dTp(b,i) = 3 + y, and dT„{a,i) — i). 
So 1-interval cospeciation implies dTff (a, «) = dx^ (6, «) = 2 + y or 3 + y. This 
imphes u + w = 1 + y or 2 + y. We also have dTp{a,j) — 2 + x, dTp{b,j) = 
1 + and dTp, (a, j) — dr^ {b, j) - Then 1-interval cospeciation implies dr^ (a, j) = 
dxij (b, j) = 1 -|- a; or 2 -f X. This implies u + v = x ot 1 + a;. 

Then either (v + w) — {u + w)~v — u = x + 2 or x + 1. If v — u = x + 2 
and u + V — X, then 2w = 2a; + 2 or u = .t + 1. This implies m = — 1, which 
is impossible. If v — u — x + 2 and u + v = 1 + a:;, then 2t; = 2a; + 3, which 
is impossible, because all variables are numbers of edges, and hence integers. 
liv — u~x + l and u + v — x, then 2v — 2a; -I- 1, which is also impossible 
because all variables are integers. Finally, ifu — 7i = a; + l and u + v = x, then 
2v = 2x + 2, which implies v = x+ 1 and u = —1, which is impossible. Therefore, 
V + w — dxH ihj) 7^ 2; -f y + 3, which is a contradiction. 

Case i — ab and j G Sc '■ Then dT'{ab,j) — dT^{b,j) — 1 = dTp{b,j) — 1 
and dr' {ab, j) = dT„ {b, j) - 1. Plugging into © gives \dTp {b, j) - dr^ {b, j) | > 1. 



Therefore, there exist at least two leaves, such that the number of edges 
between them changes by more than 1 from Th to Tp, which is a contradiction. 
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